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ABSTRACT 

ATNoSFERES is a Pittsburgh style Learning Classifier Sys- 
tem (LCS) in which the rules are represented as edges of 
an Augmented Transition Network. Genotypes are strings 
of tokens of a stack-based language, whose execution builds 
the labeled graph. The original ATNoSFERES, using a bit- 
string to represent the language tokens, has been favorably 
compared in previous work to several Michigan style LCSs 
architectures in the context of Non Markov problems. Sev- 
eral modifications of ATNoSFERES are proposed here: the 
most important one conceptually being a representational 
change: each token is now represented by an integer, hence 
the genotype is a string of integers; several other modifica- 
tions of the underlying grammar language are also proposed. 
The resulting ATNoSFERES-II is validated on several stan- 
dard animat Non Markov problems, on which it outperforms 
all previously published results in the LCS literature. The 
reasons for these improvement are carefully analyzed, and 
some assumptions are proposed on the underlying mecha- 
nisms in order to explain these good results. 

Categories and Subject Descriptors 

1.2.8 [Artificial Intelligence]: Problem Solving, Control 
Methods, and Search 

General Terms 

Experimentation 
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1. INTRODUCTION 

The Pittsburgh versus Michigan debate hcis been present 
in the Learning Classifier Systems (LCSs) community since 



the very beginning of this research area (see |14| for a presen- 
tation). On the one hand, most recent works in the LCS do- 
main have chosen the Michigan approach |7|. The ATNoS- 
FERES architecture proposed in 00, on the other hand, 
is a Pittsburgh style architecture dedicated to the resolution 
of non Markov problems. It has been compared to several 
LCSs, in particular in |3, where the authors concluded a 
systematic comparison by the claim that the ATNoSFERES 
approach was more robust and more general than its Michi- 
gan style opponents, but that it was also several orders of 
magnitude slower. 

This paper investigates the effect of several modifications 
to the original ATNoSFERES approach on its global effi- 
ciency. Instead of being represented by a bitstring, the geno- 
type, that encodes a series of tokens of the chosen stack-base 
graph-building language, is now a string of integers. This 
new representation greatly impacts on the mutation, that 
can now be naturally made uniform on the set of tokens. 
Moreover, all stack tokens of the underlying language can 
now operate on specific data on the stack. Finally, the ran- 
dom default behavior when no edge is eligible in the current 
node of the automaton is suppressed, and the contradictions 
in the conditions of all edges of the control graph are filtered 
out. Because all those changes tend to make ATNoSFERES 
representation and behavior closer to the actual semantic 
of the underlying search space - that of the ATNs - it is 
hoped that the resulting algorithm, termed ATNoSFERES- 
II, achieves at least as good results as ATNoSFERES, but 
much faster. 

The paper is organized as follows: the next section gives 
a short overview of the previously published versions of AT- 
NoSFERES. Section 13 details the modifications introduced 
in this paper. Section 2] indicates the experimental condi- 
tions used to test ATNoSFERES-II on classical Non Markov 
animat problems. Section ^ presents the results obtained 
by ATNoSFERES-II, compares them to the state-of-the-art 
LCS algorithms and produces statistical significance tests of 
the improvements. Finally, all those results are discussed in 
section |n] and some conclusions are draw on the impact of 
the proposed modifications. 
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2. OVERVIEW OF ATNoSFERES 
2.1 The bitstring representation 

The ATNoSFERES model |5] is designed to generate the 
control architecture of agents thanks to an evolutionary al- 
gorithm. The control architecture itself is represented as an 



ATN ^ graph where nodes represent states and edges repre- 
sent transitions of an automaton. The graph describing the 
behaviors is built by interpreting a genotype as a program 
in a stack-based language IT] that proceeds by adding nodes 
and edges to a basic structure initially containing only the 
Start and End nodes. This genotype is a bitstring in the 
original ATNoSFERES model. 

The graph-building process operates in two steps. First, 
during translation, the genotype is translated into a sequence 
of tokens (see table 0. Second, during interpretation, the 
token interpreter is fed with the token stream produced by 
the translator. The tokens are interpreted one by one as 
instructions of a robust programming language dedicated to 
graph building. The interpretation of each successive token 
operates on a stack in which parts of the future graph are 
stored. The construction of the graph takes place during this 
interpretation process, by creating nodes and connections 
between nodes. When all tokens have been interpreted, the 
nodes (each one carrying connections to other nodes) are 
popped from the stack and the graph is ready to use. 



a be 
tokens stack 




Figure 1: Principles of the genetic expression used 
to produce the behavioral graph from the bitstring 
genotype. The string is first decoded into tokens (a), 
which are interpreted in a second step as instruc- 
tions (b) to create nodes, edges, and labels. Finally, 
when all tokens have been interpreted, the unused 
conditions and actions remaining in the stack are 
added into the structure and the structure is popped 
from the stack (c). 

There are three kinds of tokens: action and condition to- 
kens, such as SW! or foodNE?, that are action and condi- 
tion labels for the edges of the ATNs, specific to the actions 
(movements) and perceptions (what is in a nearby cell) of 
the agent in the maze environments; graph structure tokens, 
such as node or connect, that are instructions to push nodes 
and connect them with edges, using the labels in the stack; 
and finally stack tokens, such as swap all or roll node, that 
manipulate the stack either as a whole (the all family of 
stack tokens) or by acting only on one of the two kinds of 
data present in the stack (the node and the label families of 
stack tokens). 

2.2 The evolutionary framework 

A generation of ATNoSFERES classically starts with the 
selection of the parents that will be selected for reproduc- 
tion, then applies variation operators (crossover and muta- 
tion) to these lucky parents. 

The selection is a rank-based truncated selection that pro- 
ceeds as follows: First, the n best individuals of the popula- 
tion (of size P) are retained (deterministic truncation), and 
sorted by descending fitness. They will be used to generate 

^Augmented Transition Network |15| 



P — n offspring that will complete the population. Then, in 
order to generate each pair of offspring, two parents are se- 
lected from those n best, based on a rank-based exponential 
selection: if the best parent is associated with a bias b, the 
i*'' best parent is associated with bias b.c' , for some c g]0, 1[. 
Thus, of course b, c and n must satisfy "^"Zq b.d = 1. In 
all experiments described throughout this paper, P — 300, 
n — 60, and c was chosen so that the selection pressure after 
truncation (bias from the best to the worst of the n parents) 
is 2 (i.e. c = ^5). 

After the selection described above, the genotype of two 
offspring is produced by a 2-points crossover between the 
two genotypes, where crossover points must be at the border 
between two tokens. 

Additionally, in the original ATNoSFERES approach, two 
different mutation strategies were used: classical bit-flip mu- 
tation, and random insertions or deletions of one codon 
(i.e. one token of the language at hand). However, using 
add/delete mutations did not improve the performances of 
ATNoSFERES, and only the bit-flip mutation was ever used. 
This modifies the sequence of tokens produced by transla- 
tion, so that the complexity of the graph itself may change. 
Nodes or edges can hence be added or removed by the evolu- 
tionary process, as can condition/action labels on the edges. 

2.3 The evaluation function 

The fitness of each genotype is assessed by first building 
the ATN, as described above, then by putting this ATN as 
the controller of an agent and evaluating the behavior of 
the agent in a (Non Markov) environment. There are three 
sources of non-determinism in the use of the ATN, there- 
fore, each parent is re-evaluated together with the offspring, 
and the fitness is averaged over successive evaluations (by 
calculating the mean fitness over all evaluations). 

To evaluate an agent, the ATN graphs are used as follows 
for at most a fixed number of time steps: 

• At the beginning (when the agent is initialized), the 
agent is at the Start node (S). 

• At each time step, the agent crosses an edge: 

1. It computes the set of eligible edges among those 
starting from the current node. An edge is eligible 
when either it has no condition label or all the 
conditions on its label are simultaneously true. 

2. An edge is chosen in this set. The first versions of 
ATNoSFERES were selecting one edge randomly 
(this is the first source of non-determinism), but it 
was found that a deterministic choice (e.g. choose 
the first edge in the list of eligible edges) held bet- 
ter results. If the set of eligible edges is empty, 
then an action is chosen randomly over all pos- 
sible actions, and the current node remains un- 
changed (this is the second source of non-determi- 
nism) . 

3. The actions on the label of the elected edge are 
sequentially performed by the system. Assum- 
ing that only one action can be performed at a 
time, only the last action is actually performed. 
When the action part of the label is empty, an 
action is chosen randomly. This is the third and 
last source of non-determinism. However, in all 
experiments described in this paper, the action 
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swap all — node 
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swap all — node 
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001011 
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roll all — label 
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unroll all — node 
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unroll all — label 


010000 
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010001 
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connect 
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connect 


011000 
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connect self 
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connect start 


011100 


connect start 


011101 


connect start 


011110 


connect end 


011111 


connect end 
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goN! 


100001 


goS! 


100010 


goW! 


100011 


goE! 


100100 


goNE! 


100101 


goSEI 


100110 


goNW! 


100111 


goSW! 
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emptyN? 


101001 


foodN? 


101010 


treeN? 


101011 


emptyS? 


101100 


foods? 
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trees? 


101110 


empty W? 


101111 


foodW? 
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tree W? 
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treeN E? 


110111 
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111000 
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111001 


trees E? 
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emptyNW? 
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foodN W? 


111100 


treeNW? 


111101 


empty SW? 


111110 


foods W? 


mill 


trees W? 



Table 1: The genetic code used in the experiments, for 6-bit codons. This is the mapping from bit- 
strings/integers (in binary representation) to tokens. Some codes for the swap, roll and unroll stack tokens 
have two alternative behaviors, either all or node/ label (see the text, section 13.41) . Also note that, due to 
the ATNoSFERES requirement that there are 2^ tokens for some TV, some tokens are represented more than 
once (e.g. swap all is represented 4 times), inducing a slight bias toward those nodes. 



part of all edges will contain at most one action, 
in order to simplify the comparison with LCSs. 

4. The target node of the elected edge becomes the 
new current node. 

• The agent stops when it reaches the End node (E). 
This node is a general feature of the model and may 
never be reached, either if the agent loops infinitely or 
if the experiment is stopped before. 

3. ATNOSFERES-II 

This section introduces the modifications that have been 
brought to the original ATNoSFERES model described in 
the previous section. In order to be able to assess the ef- 
fects of each modification independently of one another, all 
combinations will be made possible, even if the algorithm 
referred to as ATNoSFERES-II uses them aU. 

3.1 Integer representation 

Instead of coding a genome as a bitstring, a string of in- 
tegers is now used, where each integer is an index encoding 
a token (in the range [0, ^tokens]). There might be re- 
peated tokens, which is the case in our experiments. First, 
this simplifies the translation process (the decoding of a bit- 
string into an integer disappears). However, and because 
crossover was only allowed at the boundary between tokens 
in the bitstring genotype of the original ATNoSFERES, the 
only visible effect of this deep modification is at the mutation 
level: it is now possible to use the uniform mutation oper- 
ator that replaces a given token by another one uniformly, 
i.e. with equal chance for all tokens in the list. 

Another possibility could be to define a distance among 
the tokens based on their semantics (their effect on the struc- 
ture being built), in order to bias the mutation. Such a 
bias would be used to smooth mutation strength by having 
smaller effects on the phenotype change. However, it is clear 
that the bitflip mutation used in the original ATNoSFERES 
was in fact equivalent to such a weighted mutation, but the 
mutation biases were dependent on the order of the tokens 
in the token list (e.g. by having condition and action tokens 
at the second half of the list, the first bit was a flag for condi- 
tion/action tokens). The induced mutation biases were thus 



completely arbitrary with respect to the problem at hand. 
Another important difference is that the total number of en- 
coded tokens no longer has to be 2^^ for some A'^ (where N 
is the number of bits used to represent a token) . 

Note that a similar effect could also have been obtained in 
the bitstring context by increasing the bitstring length and 
decoding each token using a modulo function, as is theoret- 
ically proved in |1U| . 

In the following, and because the only visible effect of this 
change of representation is the actual change of mutation, 
the original ATNoSFERES approach will be referred to as 
BitFlip while the ATNoSFERES-II approach wiU be termed 
Uniform. 

3.2 Default node action 

As stated in section in the original version of ATNoS- 
FERES, when no edge could be elected, an action was cho- 
sen randomly - this was referred to as the second source of 
non-determinism. However, this results in non-deterministic 
performance of the algorithm. 

In ATNoSFERES-II, a drastic strategy is used in such a 
case: when no edge is eligible, the evaluation of the agent 
stops and FAIL is returned. As a consequence, the remain- 
ing number of time steps for the current test is decreased to 
zero, and the fitness for this test is null (the overall fitness is 
the sum over several tests, see section llj. In the following, 
this choice of abruptly ending the test when no edge is eli- 
gible will be referred to as Finish, while the random choice 
of ATNoSFERES wiU be caUed Random. 

We must emphasize that the third source of non-determi- 
nism (performing a random action when crossing an edge 
with no action label) seems experimentally to have much less 
impact on non-deterministic performances. We observed 
that the few such edges still present in the population after 
some generations were never eligible. 

3.3 Contradiction filtering 

In the original version of ATNoSFERES, an edge of the 
control graph could eventually be labeled with contradictory 
conditions, resulting in its permanent ineligibility. Such a 
situation cannot happen in LCSs, since the condition part 
of a classifier cannot contain contradictions. In the present 



work, the occurrence of contradictory conditions in the label 
of edges is prevented by forbidding more than one condition 
specifying the value of the same attribute in the condition: 
once an attribute has been used in a given label, all sub- 
sequent tokens involving that attribute are ignored for that 
label. The use of this mechanism will be referred to as No- 
Contradiction, in contrast with the Contradiction original 
ATNoSFERES algorithm. 

3.4 Behavior of swap/roll/unroll tokens 

In the original version of ATNoSFERES, the swap, roll 
and unroll stack tokens could operate without discrimina- 
tion on any data in the stack, whereas some other stack 
tokens could only operate either on the nodes or on the 
labels: e.g. dup node only operated on nodes, and dup la- 
bel only on labels. This "typing" of tokens is extended in 
ATNoSFERES-II to all available stack tokens, in order to 
facilitate structural manipulations of the automata being 
built. The previous set of the untyped swap/roll/unroll to- 
kens will be referred to as all, while the typed ones will be 
referred to as node/label. 

4. EXPERIMENTAL SETTINGS 

4.1 Representation 

Our experimental set-up is the same as in IH, described in 
sectionIO except for the new features of ATNoSFERES-II. 
We used a 1% bias for mutation probabilities. In particular, 
since one of the goals of the present experiments is to com- 
pare the uniform and the bit-flip mutations on a fair basis, 
the distribution of the set of tokens is also the same (see ta- 
ble0, except that, of course, integer- and not bit-strings are 
used in the Uniform approach of ATNoSFERES-II (section 
13.111 . The other difference, as pointed out in section lies 
in the use of "typed" swap, roll and unroll tokens (see al- 
ternatives in table0, when it comes to compare Node/Label 
against All. 

Capitalizing on previous published experiments, we do not 
start with as many genome lengths as before 13 El- Some 
preliminary tests with lengths of 250, 300 and 350 tokens 
on a subset of parameter combinations demonstrated that, 
in this range, the genome lengths had no statistically signif- 
icant impact on the final performance. From thereon, and 
with an abusive generalization to all environments, we shall 
consider that the average results obtained with this limited 
range of lengths can be extended to other lengths for all 
environments. 

Also, in the next section, the graphical representation of 
the graphs is different from that of in order to make 

the figures more readable (e.g. see figure |5J. First, the 
condition part is presented above the action part. Then, 
each condition is here prefixed by a letter representing the 
perception of the agent: food, tree, or empty. The Start 
node is labeled 0, and the End node is the node labeled with 
the largest number. If no condition is present on an edge, 
any perception will match. Nodes and edges are represented 
as usual by circles and directed arrows. 

4.2 The environments and the fitness 

Three different environments, respectively MazelO El 
and E2 0, have been used to validate and test the mod- 
ifications of ATNoSFERES proposed in section El For all 
these environments, the "optimal policy" is the best policy 



that can be found with the standard limited perception used 
in the so called "woods" experiments |13| without any limit 
on the number of memory bits. The number of steps to find 
the food from each cell given by this optimal policy has been 
presented in 4 . 

For each given environment, each automaton is started 
once in each cell, and its fitness is measured as the average 
number of steps it takes to find the food, averaged over all 
cells (to be minimized). 

5. RESULTS 

Because we did not want to make any assumption about 
the usefulness of each on the modifications to the origi- 
nal ATNoSFERES proposed in section |3 systematic exper- 
iments were conducted in each of the three environments in 
order to evaluate the 16 possible combinations of each of the 
four modifications, with 50 independent runs per setting. 

5.1 Best solutions found 

This section will present the best overall results obtained 
in each environment. In each case, both the automaton 
and a picture of its performance with respect to the optimal 
policy is given. All those automata have been obtained using 
the Uniform and No contradiction modifications - the other 
two will be detailed in each subsection. 

5.1.1 MazelO 




Figure 2: Best automaton found for the MazelO 
environment (5.11 steps to food; an optimal policy 
requires 5.05). 19 edges (3 nodes) that are never 
elected (reached) are not represented, for the sake 
of readability. 




Figure 3: Performance of the best policy found for 
the MazelO environment (by the automaton of fig- 
ure 121). It is only one step worse, and on a single 
cell, than the optimum (dotted circle). The rest is 
optimal. 

In MazelO, the best resuU of ATNoSFERES-II (figure 
has a policy that is is only one step longer than the optimal 
policy (see figure EJ, when the Start cell is just below the 



NE corner of the maze. It was found using 300 codons, and 
the Node/Label and Finish parameters. As far as we can 
tell, no LCS has ever performed so well on this problem. 
The average number of steps to food is 5.11, vs. 5.61 for 
ATNoSFERES in E]. 

5.1.2 El 




Figure 4: Best automaton found for the El environ- 
ment (2.90 steps to food; an optimal policy requires 
2.81). 30 edges (11 nodes) that were never elected 
(reached) are not represented, for the sake of read- 
ability. 
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Figure 5: Best policy found for the El environment 
(by the automaton of figure 1^ . Only two cells are 
only two time steps away from the optimum (dashed 
circles). The rest is optimal. 

In environment El, the best policy found is only 4 steps 
longer than the optimal one (figure |^. It was found with 
a chromosome length of 250, and using the All and Finish 
settings. Again, as far as we are aware, no LCS has ever 
performed so well on this problem: The average number of 
steps to food is 2.90, vs. 3.3 for ACS in 9 . 

5.1.3 E2 




Figure 6: Best automaton found for the E2 environ- 
ment (3.29 steps to food; an optimal policy requires 
2.98). 13 edges (1 node) that were never elected 
(reached) are not represented, for the sake of read- 
ability. 

In environment E2, the best policy found is 15 steps away 
from the optimum (figure0. Its genotype length is 350, and 
again the All and Finish strategies were used. As for the two 
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Figure 7: Best policy found for the E2 environment 
(by the automaton of figure IHll. Three cells are one 
time step away from the optimum (dotted circles), 
three cells are two time steps away from the opti- 
mum (dashed circles), and two cells are three time 
steps away from the optimum (solid circles). The 
rest is optimal. 

other environments, as far as we can tell, no LCS has ever 
performed so well on this problem: The average number of 
steps to food is 3.29, vs. 3.58 for ATNoSFERES in f . 

5.2 Validation of the different modifications 

The noticeable improvements of the off-line best results re- 
ported in the previous section when using ATNoSFERES-II 
have been obtained using (some of) the modifications pre- 
sented in section 121 The present section investigates which 
of the corresponding modifications has a significant impact 
on the overall performance of the algorithm in average. 
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Table 2: Average number of steps to food for each 
parameter value and environment, and T-test p- 
value for each couple of alternative parameter val- 
ues. A p-value lower than 0.05 means a statistically 
significant difference (in bold). 

Table |21 analyzes the results according to each couple of 
parameter values, that is, for each value of a given parameter 
(e.g. Umform or BitFlip), the results are averaged over the 8 
possible values of the other 3 parameters. It clearly appears 
that Uniform mutation produces significantly better results 
than BitFlip in all three environments. The statistical sig- 
nificance is not so clear for the other strategies, except for 
Node/Label and Random for MazelO, and No contradiction 
for El. 

Hence in the remaining of this section. Uniform is as- 
sumed, and the analysis is carried over on the 8 sets of ex- 
periments for which Uniform was used. Table |21 shows the 
results of this analysis, and it appears that Random gives 
significantly better results for MazelO and E2, but not for 
El. Other parameters do not produce significantly different 
results, with the exception of Node/Label for MazelO. 
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Experiments tor ATNoSFERES II/E1 (lengtti=300) 



Table 3: Average mean for each parameter value and 
environment, and T-test p- value for each couple of 
alternative parameter values, assuming Uniform. 
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Table 4: Average mean for each parameter value and 
environment, and T-test p- value for each couple of 
alternative parameter values, assuming Uniform and 
Random. 



Assuming now Uniform and Random, No contradiction 
still gives significantly better results for the El environment 
(see table|3Jl. So even if the Contradiction / No contradiction 
parameters do not produce significantly different results for 
all environments, No contradiction nevertheless gives bet- 
ter average mean results in all experiments, whether con- 
sidered alone, or in conjunction with either Uniform alone 
or Uniform and Random. Furthermore, all the best known 
solutions found for the tested environments use the No con- 
tradiction value. 
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Figure 8: Results for the MazelO environment, ac- 
cording to the mutation and default node action 
variables. See section 15.21 for details and analysis. 

Box plots (see figureslSllSland [TUt are provided to give an 
informal idea of the distributions of the fitnesses for each en- 
vironment, according to the two meaningful variables (mu- 
tation and default node action). These graphics represent 
meaningful statistics rather than all the data: the median 
as an horizontal line in the boxes, rather than the sample 
mean, as the measure of central tendency due to the skew 



--- ACS (BSmax=t ) 

ACS (BSmax=2) 

optimal 



BifFlip.Finisti Uniform. Finisti BiiFlip. Random Uniform. Random 

Figure 9: Results for the El environment, according 
to the mutation and default node action variables. 
See section 15.21 for details and analysis. 

Experiments for ATNoSFERES II/E2 (lengtli=300) 



ACS 
optimal 



Uniform. Finisti BifFlip. Random Uniform. Random 



Figure 10: Distribution of the results for the E2 
environment, according to the mutation and default 
node action variables. See section l5.2l for details and 
analysis. 



in the distribution caused by the fixed lower limit of fitness 
values; the second and third quartiles, taken as the upper 
and lower edges of the boxes; finally, the "whiskers" are plot 
from both quartiles to the extremal values. 

The distribution for ATNoSFERES corresponds more or 
less to the BitFlip/Random case. From these plots, one 
can see that, on the one hand. Uniform mutation clearly 
outperforms BitFlip in all cases, and on the other hand, for 
example for MazelO (see figure that approximately 25% 
of the Uniform/Random solutions outperform XCSMH (1st 
quartile) . 

5.3 Comparison of convergence time with ACS 
and XCSMH 

Table |3 borrowed from ,4, , gives a comparison between 
the performance of the original ATNoSFERES and that of 
some well-known LCSs from the literature in terms of con- 
vergence time. 

From this table, it was clear that, though being more 
general and obtaining sometimes better solutions, ATNoS- 
FERES was several orders of magnitude slower than all 
other tested LCSs. However, since the modifications that 
lead to ATNoSFERES-II significantly improved those per- 
formances, it remains to check again whether this improve- 
ment has a significant impact on the running time of the 
algorithm, and to compare this running time to that of the 
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LCS type 
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LCS 
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NG 


NT 


NT/LCS 


MazelO 


XCSM 


15.1 


7,000 


100 


8.42 


45,000 


6.42 


MazelO 


XCSMH 


6.1 


6,500 


7.30 


4909 


360.10'' 


55,000 


El 


ACS (BS^ax = 1) 


4 


4,400 


30.92 


5115 


218. lO'' 


50,000 


E2 


ACS (BSmax = 2 or 3) 


6.5 


2,000 


83.84 


835 


14. 10** 


7,000 



Table 5: Comparison between the original ATNoSFERES and LCSs. The measure is the average number 
of trials needed to reach the performance given in column "perf.". PR is the percentage of runs where 
ATNoSFERES outperforms the corresponding LCS, and NG is the average number of generations before this 
takes place. Thus, with 300 individuals per generation, the average number of evaluation runs necessary to 
outperform the corresponding LCS is NE — 300* NG * 100/ PR, and the average number of elementary runs NT 
for an environment with NS start cells {NS = 18 for MazelO, 44 for El and 48 for E2) is: NT = MS* NE. Thus 
NT gives the average number of elementary runs needed by ATNoSFERES to outperform the corresponding 
LCS. Finally, NT/LCS gives a good approximation of the factor by which the corresponding LCS is faster 
than ATNoSFERES to reach its best performance. Note that no performance comparison is given on El 
against ACS with BSmax = 2 since ATNoSFERES never outperforms it. 



same LCSs. The results are shown in table |S| and show 
that ATNoSFERES-II is indeed much faster than ATNoS- 
FERES, though still slower than the other LCSs. 

6. DISCUSSION 

The first remark we want to make is that, despite the 
improvement in performance brought by our modifications, 
we still did not succeed in reaching the optimal policy in 
any of the three environments tested here. This illustrates 
how difficult these simple non Markov environments are for 
a genetic-based machine learning system. 

From section 15.21 we can see that the most important 
and significant improvement is the change of representa- 
tion, through the induced change of mutation. The im- 
pressive efficiency of this modification is probably mainly 
due to the codons-to-tokens mapping that was used in the 
original version of ATNoSFERES (table EJ. As a matter of 
fact, one can note that the leftmost bit is an indicator for 
label tokens: tokens enumerated from 100000 to 111111 are 
exclusively action or condition tokens, to be pushed on the 
stack. Therefore, when using a bit-flip mutation over the 
binary encoding of the codons, with a 1% probability, once 
a codon has its leftmost bit set, it has 99% probabilities 
to remain an action or a condition: the combination of the 
mapping and the bit-flip mutation resulted in a very strong 
bias toward labels exploration, with very little structure ex- 
ploration. On the other hand, when using the uniform mu- 
tation over string of integers, structural changes occur with 
probability 0.5 during mutation - and the results show that 
this additional structural exploration significantly improves 
the performance of the algorithm. Further work will look in 
detail at the statistics of every token used by the best solu- 
tions, and should help to find a more efficient distribution 
bias over the different tokens, that will be easy to implement 
in the integer representation. 

The introduction of Finish default action node did not 
prove very useful: either Random performs significantly bet- 
ter, or the difference is not statistically signiflcant. But 
this conclusion is consistent with the general consensus in 
the Reinforcement Learning community, according to which 
some non-determinism is always helpful when tackling Non 
Markov problems, as purely deterministic controllers might 
easily get stuck in local minima. However, in the context of 
evolutionary techniques, non-determinism makes the evalu- 
ation more difficult and may be detrimental to the conver- 



gence of the GA. Some optimal trade-off probably remains 
to be found here. 

The Node/Label set of stack tokens, introduced in order 
to facilitate structural modifications of the automaton be- 
ing built during the interpretation process, only improves 
results in the MazelO environment. Restricting to the Uni- 
form/Random combination, this advantage vanishes (see ta- 
ble]^. More precisely, on MazelO, only restricting to Uni- 
form results in Random still significantly outperforming Fin- 
ish on average (see table EJ , and such is the case too for 
Node/Label with respect to All, with even more significance. 
So, the incompatibility on MazelO lies between Node/Label 
and Random. 

Nevertheless, this phenomenon is difficult to explain. In 
13], it was shown that the kind of structures necessary to 
perform optimally in MazelO, El and E2 are very different 
from one another. It is likely that a difference in the token 
language would result in a different probability of obtaining 
this or that structure, but the corresponding analysis is at 
the moment beyond our reach. If this assumption appears 
to be true, this means that the token language itself has to 
be specifically tuned for each particular problem. 

Quite surprisingly, the Contradiction/Non-Contradiction 
alternative does not systematically make significant differ- 
ence either. As a matter of fact, the difference is significant 
in average fitnesses for both alternatives only on El. It 
could have been expected that biasing the population to- 
wards meaningful condition parts would systematically re- 
duce the search space towards efficient automata, but this 
does not seem to be the case, even if the Non-Contradictory 
language seems to perform slightly better in all experiments. 
Here, we must take into account the fact that the Non- 
Contradictory language induces more introns in the geno- 
type, as many conditions are simply ignored once put upon 
an edge. So again, there is probably a trade-off between 
a more efficient search, thanks to more meaningful edges, 
and degradation of performances because of the genotype 
bloat. Future work will investigate this assumption by try- 
ing much shorter genotypes, leaving less room to introns, 
and/or adding some parsimony pressure to the fitness. If 
our assumption is true, we should observe even better results 
on average when exploring a space biased towards shorter 
genotypes. 

Finally, from the results of section 15.31 it is clear that, 
even if the improvement in performance has a serious impact 
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Table 6: Same comparison as in table Isl between the new version of ATNoSFERES (assuming only Uniform 
mutation and Random default node action), and LCSs. This time, performance comparisons can be given on 
El against ACS with BSmax = 2. 



on convergence time, this impact is still far form sufficient 
to make ATNoSFERES-II competitive in speed with ACS, 
[TTI and XCSMH the most efficient Michigan-style 
LCSs (the case of XCSM can be ignored due to the much 
higher efficiency of XCSMH). We feel that this conclusion 
advocates once again for the more general claim that: 

• the Pittsburgh approaches are often more robust than 
Michigan ones and, given enough time, can often ob- 
tain better performances, but 

• they are generally much slower, which results in poorer 
performance under strong CPU time constraints. 

7. CONCLUSION 

In this paper, we have studied the effect on the perfor- 
mance of several modifications of the Pittsburgh style sys- 
tem ATNoSFERES. Some of these modifications, such as a 
different encoding of the token language, have been demon- 
strated to have a statistically significant impact on the per- 
formance. Other modifications, such as the deterministic 
versus stochastic behavior of the automaton, or the pres- 
ence versus absence of contradictions on the label of edges, 
have given less conclusive results. 

In some cases, the effect of these variations have not been 
clearly explained so far: further investigations are necessary 
to change the assumptions we made about these phenomena 
into unquestionable explanations. 

Anyway, one clear result of this paper is that, thanks 
to some of these modifications, ATNoSFERES-II was able 
to find the best CS-based controllers known so far in the 
MazelO, El and E2 environments. However, it should be 
noticed that this improvement in performance was not ac- 
companied by a clear improvement in convergence speed. 
This last assertion, as well as general considerations known 
as the Michigan vs Pittsburgh debate, are a strong incentive 
to try to design a Michigan- style system based on the same 
representation as ATNoSFERES. We hope that the studies 
conducted here will help in making this future system more 
efficient. 
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